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Abstract 

We study the algebraic varieties defined by the conditional indepen- 
dence statements of Bayesian networks. A complete algebraic classifi- 
cation is given for Bayesian networks on at most five random variables. 
Hidden variables are related to the geometry of higher secant varieties. 

1 Introduction 

The emerging field of algebraic statistics advocates polynomial algebra as 
a tool in the statistical analysis of experiments and discrete data. Statistics 
textbooks define a statistical model as a family of probability distributions, 
and a closer look reveals that these families are often real algebraic varieties: 
they are the zeros of some polynomials in the probability simplex [7j, jl7j . 

In this paper we examine directed graphical models for discrete random 
variables. Such models are also known as Bayesian networks and they are 
widely used in machine learning, bioinformatics and many other applications 
jl3j , • Our aim is to place Bayesian networks into the realm of algebraic 
statistics, by developing the necessary theory in algebraic geometry and by 
demonstrating the effectiveness of Grobner bases for this class of models. 

Bayesian networks can be described in two possible ways, either by a 
recursive factorization of probability distributions or by conditional inde- 
pendence statements (local and global Markov properties). This is an in- 
stance of the computer algebra principle that varieties can be presented 
either parametrically or implicitly §3.3]. The equivalence of these two 
representations for Bayesian networks is a well-known theorem in statistics 
jl3l Theorem 3.27], but, as we shall see, this theorem is surprisingly delicate 
and no longer holds when probabilities are replaced by negative reals or 
complex numbers. Hence in the usual setting of algebraic geometry, where 
the zeros lie in C^, there are many "distributions" which satisfy the global 
Markov property but which do not permit a recursive factorization. We 
explain this phenomenon using primary decomposition of polynomial ideals. 



This paper is organized as follows. In Section 2 we review the algebraic 
theory of conditional independence, and we explicitly determine the Grobner 
basis and primary decomposition arising from the contraction axiom jl5j . 
[211 §2.2.2]. This axiom is shown to fail for negative real numbers. In 
Section 3 we introduce the ideals /locai(G) ^^'^ -^giobai(G) which represent a 
a Bayesian network G. When G is a forest then these ideals are the toric 
ideals derived from undirected graphs as in 0; see Theorem IHl below. 

The recursive factorization of a Bayesian network gives rise to a map 
between polynomial rings which is studied in Section 4. The kernel of this 
factorization map is the distinguished prime ideal. We prove that this prime 
is always a reduced primary component of /locai(G) ^iicl /giobai(G)- results 
in that section include the solutions to Problems 8.11 and 8.12 in ^23) . 

In Sections 5 and 6 we present the results of our computational efforts: 
the complete algebraic classification of all Bayesian networks on four arbi- 
trary random variables and all Bayesian networks on five binary random 
variables. The latter involved computing the primary decomposition of 301 
ideals generated by a large number of quadrics in 32 unknowns. These large- 
scale primary decompositions were carried out in Macaulay2 ■ Some of 
the techniques and software tools we used are described in the Appendix. 

The appearance of hidden variables in Bayesian networks leads to chal- 
lenging problems in algebraic geometry. Statisticians have known for decades 
that the dimension of the corresponding varieties can unexpectedly drop [H] , 
but the responsible singularities have been studied only quite recently, in [7j 
and • In Section 7 we examine the elimination problem arising from hid- 
den random variables, and we relate it to problems in projective algebraic 
geometry. We demonstrate that the naive Bayes model corresponds to the 
higher secant varieties of Segre varieties (0, 0)) and we present several 
new results on the dimension and defining ideals of these secant varieties. 

Our algebraic theory does not compete with but rather complements 
other approaches to conditional independence models. An impressive com- 
binatorial theory of such models has been developed by Matiis fl^ and 
Studeny |2J, culminating in their characterization of all realizable indepen- 
dence models on four random variables. Sharing many of the views expressed 
by these authors, we believe that exploring the precise relation between their 
work and ours will be a very fruitful research direction for the near future. 
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2 Ideals, Varieties and Independence Models 



We begin by reviewing the general algebraic framework for independence 
models presented in §8]. Let Xi, . . . ,Xn be discrete random variables 
where Xi takes values in the finite set [di] = {1, 2, . . . , di}. We write D = 
[di] X [^2] X • • • X [dn] so that denotes the real vector space of n-dimensional 
tables of format di x • • • x dn- We introduce an indeterminate PuiU2 - u„ which 
represents the probability of the event Xi = ui, X2 = U2,...,X„ = Un- 
These indeterminates generate the ring M[D] of polynomial functions on the 
space of tables M^. A conditional independence statement has the form 

A is independent of B given C ( in symbols: AALB \ C) (1) 

where A.,B and C are pairwise disjoint subsets of {Xi, . . . If C is 

empty then means that A is independent of B. By 123, Proposition 8.1], 
the statement translates into a set of homogeneous quadratic polynomials 
in M[I?], and we write for the ideal generated by these polynomials. 

Many statistical models (see e.g. jlS], |25) can be described by a finite 
set of independence statements An independence model is any such set: 

M = {A^^^ALB^^^ I C^^\ . . . , I C(™)}. 

The ideal of the independence model M is defined as the sum of ideals 

Im = ^awALbw\cw~^ I" -^A('")_LLB{'")|c{'")■ 
We wrote code in Macaulay2 'lU' and Singular 'llj for generating the ideals 
/_A4. The independence variety is the set V{Im ) of common zeros in of the 
polynomials in Im- Equivalently, V{Im) is the set of all di x • • • x dn-tahles 
with complex number entries which satisfy the conditional independence 
statements in Ai. The variety V{Im) has three natural subsets: 

• the subset of real tables, denoted Vm.{Im)^ 

• the non-negative tables, denoted V>{Im), 

• the non-negative tables whose entries sum to one, V>{Im + {p — ^)), 

Here p denotes the sum of all unknowns Pm so that V>{Im + {p — 1)) 
is the subset of the probability simplex specified by the model A4. 

We illustrate these definitions by analyzing the independence model 
= I 1_LL2 I 3 , 2_LL3 } for n = 3 discrete random variables. Theo- 
rem ^ will be cited in Section 5 and it serves as a preview to Theorem II II 
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The ideal Im lies in the polynomial ring in did2d^ unknowns Pijk- Its 
minimal generators are ('^20('t)^3 quad rics of the form PiJk Prsk — Pisk Prjk 
and quad rics of the form pj^j^p+st — P+jtP+sk- We change coordi- 

nates in M.[D] by replacing each unknown piji^ by p+jk = YliLiPijk- This 
coordinate change transforms Im into a binomial ideal in ^[1?]. 

Theorem 1. The ideal Im has a Grohner basis consisting of squarefree 
binomials of degree two, three and four, and it is hence radical. It has 2'^^—l 
minimal primes, each generated by the 2 x 2-minors of a generic matrix. 

Proof. The minimal primes of Im will be indexed by proper subsets of [^3]. 
For each such subset a we introduce the monomial prime 

Mo- = {p+jk I J G [d2], k £ a), 

and the complementary monomial 

j=l ke[d3\\u 

and we define the ideal 

Pa ■■= {{Im + M^):m'^). 

It follows from the general theory of binomial ideals (6j that Po- is a binomial 
prime ideal. A closer look reveals that is minimally generated by the 
d2 ■ \cr\ variables in M„ together with all the 2 x 2-minors of the following 
two-dimensional matrices: the matrix {pijk) where the rows are indexed by 
j G [^2] and the columns are indexed by pairs {i, k) with i G {+, 2, 3, ... , di} 
and k £ [d3]\a, and for each k G a, the matrices {pijk ) where the rows are 
indexed by j G [^2] and the columns are indexed by i £ {2,3, . . . ,di}. 

We partition V{Im) into 2*^^ strata, each indexed by a subset a of [ds]. 
Namely, given a point (pijk) in V{Im) we define the subset a of [d^] as 
the set of all indices k such that {p+ik,p+2k, ■ ■ ■ ,P+d2k) is the zero vector. 
Note that two tables {pijk) lie in the same stratum if and only if they give 
the same a. The stratum indexed by cr is a dense subset in V{Pa). When 
"7 = [ds] the stratum consists of all tables such that the line sums p+jk 
are all zero, and for each fixed k, the remaining {di — 1) x d2-ioaatnx {pijk) 
with i > 2 has rank < 1. So this locus is defined by the prime ideal Pid^]- 
Any point in this stratum satisfies the defining equations of P^ for any 
proper subset a. So the stratum indexed by [d^] lies in the closure of all 
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other strata. But all remaining 2'^-' — 1 strata have the property that no 
stratum lies in the closure of any other stratum, since the generic point of 
Pfj lies in exactly one stratum for any proper subset a. Hence V{Im) is the 
irredundant union of the irreducible varieties V{Pa) where a runs over all 
proper subsets of [^3]. The second assertion in Theorem ^ now follows from 
Hubert's Nullstellensatz. 

To prove the first assertion, let us first note that Pg is the prime ideal of 
2 X 2-minors of the d2 x (di(i3)-matrix (pijk) with rows indexed by j G [^2] 
and columns indexed by pairs (i, k) £ {+, 2, 3, ... , di} x [da]. Hence 

It is well known (see e.g. [221 Proposition 5.4]) that the quadratic generators 

Pijk Prst Pisk Prjt (3) 

form a reduced Grobner basis for ^ with respect to the "diagonal term 
order" . We modify this Grobner basis to a Grobner basis for /_a4 as follows: 

• if k = t take 

• if i = + and r = + take @, 

• if i = + and r 7^ + and k ^ t take ^ times p+jt for any j, 

• if i 7^ + and r 7^ + and k t take (jSl times pj^jtp+sk for any j, s. 

All of these binomials lie in /_a/( (this can be seen by taking S-pairs of the 
generators) and their S-pairs reduce to zero. By Buchberger's criterion, 
the given set of quadrics, cubics and quartics is a Grobner basis, and the 
corresponding initial monomial ideal is square-free. This implies that is 
radical (by [231 Proposition 5.3]), and the proof is complete. □ 

The theorem above can be regarded as an algebraic refinement of the 
following well-known rule for conditional independence (^Sl) |2H §2.2.2]). 

Corollary 2. (Contraction Axiom) // a probability distribution on [di] x 
[^2] X [d^] satisfies 1_LL2 | 3 and 2_LL3 then it also satisfies 2_LL{1,3}. 

Proof. The non- negative points satisfy V>{Pij) Q V>(P0), and this implies 

V>{Im) = ^>U2X{1,3})- 

Intersecting with the probability simplex yields the assertion. □ 
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Theorem n shows that the Contraction Axiom fails to hold when proba- 
bilities are replaced by negative real numbers. Any general point on V{Pa) 
for fj 7^ satisfies 1_LL2 | 3 and 2_LL3 but it does not satisfy 2_LL{1, 3}. 

3 Algebraic Representation of Bayesian Networks 

A Bayesian network is an acyclic directed graph G with vertices Xi , . . . , Xn . 
The following notation and terminology is consistent with Lauritzen's book 
jl3j . The local Markov property on G is the set of independence statements 

local(G) = {XiALnd{Xi)\pa{Xi) : i = l,2,...,n}, 

where pa(Xj) denotes the set of parents of Xi in G and nd(Xj) denotes 
the set of nondescendents of Xi in G. Here Xj is a nondescendent of Xi if 
there is no directed path from Xi to Xj in G. The global Markov property, 
global(G), is the set of independence statements AILB \ C, for any triple 
A,B,C of subsets of vertices of G such that A and B are d-separated by 
C. Here two subsets A and B are said to be d-separated by G if all chains 
from A to B are blocked by G. A chain tt from Xi to Xj in G is said to be 
blocked by a set G of nodes if it contains a vertex Xf, G vr such that either 

• Xh G C and arrows of vr do not meet head-to-head at X^, or 

• Xb ^ C and Xj, has no descendents in C, and arrows of vr do meet 
head-to- head at Xf,. 

For any Bayesian network G, we have local(G) C global(G), and this 
implies the following containment relations between ideals and varieties 

-^local(G) ^ -^global{G) ^nd Mocal{G) 5 ^lobal(G)- (4) 

The latter inclusion extends to the three real varieties listed above, and 
we shall discuss when equality holds. First, however, we give an algebraic 
version of the description of Bayesian networks by recursive factorizations. 

Consider the set of parents of the j-th node, pa{Xj) = {Xi-^^, . . . ,Xi^}, 
and consider any event Xj = uq conditioned on Xi-^ = ui, . . . , Xi^ = Ur, 
where 1 < uq < dj,l < ui < dj^ , . . . , 1 < Ur < dj,, . We introduce an 
unknown quoui---ur to denote the conditional probability of this event, and 
we subject these unknowns to the linear relations Ylt=i Qvui---ur = 1 for all 
^ 1^ ui < dij^, . . . ,1 < Ur < di^.. Thus, we have introduced {dj — l)di-^ ■ ■ ■ di^ 
unknowns for the vertex j. Let E denote the set of these unknowns Quoui-ur 
for all j £ {1, . . . , n}, and let M[E] denote the polynomial ring they generate. 
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If the n random variables are binary (dj = 2 for all i) then the notation 
for M[E] can be simplified by dropping the first lower index and writing 



„j ._ „0') _ 1 _ „(j) 

In the binary case, R[£'] is a polynomial ring in ^n^^2lP'^(^j)l unknowns. 

The factorization of probability distributions according to G defines a 
polynomial map (j) : — > M^. By restricting to non- negative reals we get 
an induced map </>>o- These maps are specified by the ring homomorphism 
$ : M[Z)] — > W[E] which takes the unknown PuiU2---un to the product of 
the expressions qu]ui-^---Ui^ as j runs over {!,..., n}. The image of (/> lies in 
the independence variety Igiobai(G)) or, equivalently, the independence ideal 
/giobai(G) is contained in the prime ideal ker(<I>). The Factorization Theorem 
for Bayesian networks jl3l Theorem 3.27] states: 

Theorem 3. The following four subsets of the probability simplex coincide: 

^>(-^local(G) + {P- 1)) = ^>(4lobal{G) + {P - 1)) 

= V>(ker(<I>)) = image((/)>). 

Example 4. Let G be the network on three binary random variables which 
has a single directed edge from 3 to 2. The parents and nondescendents are 

pa(l) = 0,nd(l) = {2,3}, pa(2) = {3}, nd(2) = {!}, pa(3) = 0, nd(3) = {!}. 

The resulting conditional independence statements are 

local(G) = global(G) = {l-U_3, 1_U_2 | 3, 1_U_{2,3}}. 

The ideal expressing the first two statements is contained in the ideal ex- 
pressing the third statement, and we find that /locai(G) = A_LL{2 3} 
ideal generated by the six 2 x 2-subdeterminants of the 2 x 4-matrix 



Pin P112 P121 P122 
P2U P212 P221 P222 



(5) 



This ideal is prime and its generators form a Grobner basis. The Fac- 
torization Theorem is understood as follows for this example. We have 
E = {9^) ^i) 92' 9^}' ^^'^ ™g ^ takes the matrix Q to 

\{l-q^)qlq^ il-q^)qW^-q^) {l-q^){l-qi)q^ 7 
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The map ip from to corresponding to the ring map <I> : M.[D] — > M[S] 
gives a parametrization of all 2 x 4-matrices of rank 1 whose entries sum to 
1. The Factorization Theorem for G is the same statement for non- negative 
matrices. The kernel of <I> is exactly equal to /iocai{G) + (p ~ !)• D 

Our aim is to decide to what extent the Factorization Theorem is valid 
over all real and all complex numbers. The corresponding algebraic question 
is to study the ideal /locai(G) to determine its primary decomposition. 
Let us begin by considering all Bayesian networks on three random variables. 
We shall prove that for such small networks the ideal /locai(G) is always 
prime and coincides with the kernel of <I>. The following theorem is valid for 
arbitrary positive integers di,d2-,d^- It is not restricted to the binary case. 

Proposition 5. For any Bayesian network G on three discrete random 
variables, the ideal /local(G) prime, and it has a quadratic Grobner basis. 

Proof. We completely classify all possible cases. If G is the complete graph, 
directed acyclically, then local (G) contains no nontrivial independence state- 
ments, so /locai(G) is the zero ideal. In what follows we always exclude this 
case. There are five isomorphism types of (non-complete) directed acyclic 
graphs on three nodes. They correspond to the rows of the following table: 



Graph 


Local/Global Markov property 


Independence ideal 


3 2 1 


1_U_{2,3}, 2_U_{1,3}, 3_U_{1,2} 


-^Segrc 


3 — >2 1 


1_U_3, 1_LL2 1 3, 1_U_{2,3} 


-^1_LL{2,3} 


3 — >2 — >1 


1_U_3 1 2 


-^1_LL3|2 


li — 3 — >2 


1_U_2 1 3 


A_LL2|3 


3 — >1< — 2 


2_U_3 





The third and fourth network represent the same independence model. 
In all cases except for the first, the ideal /locai(G) is of the form n 
i.e., it is specified by a single independence statement. It was shown in |23| 
Lemma 8.2] that such ideals are prime. They are determinantal ideals and 
well known to possess a quadratic Grobner basis. The only exceptional graph 
is the empty graph, which leads to the model of complete independence 
1_LL{2,3}, 2_LL{1,3}, 3_LL{1,2}. The corresponding ideal defines the Segre 
embedding of the product of three projective spaces P'^i"^ x P'^a-i ^ p'^s-i 
into P'^'i'^arfs-i. This ideal is prime and has a quadratic Grobner basis. □ 

A network G is a directed forest if every node has at most one parent. 
The conclusion of Proposition^lalso holds for directed forests on any number 
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of nodes. Proposition will show that the direction of the edges is crucial: 
it is not sufficient to assume that the underlying undirected graph is a forest. 

Theorem 6. Let G he a directed forest. Then /giobai(G) ^-^ prime and has a 
quadratic Grobner basis. These properties generally fail for /local(G) • 

Proof. For a direct forest, the definition of a blocked chain reads as follows. 
A chain tt from Xi to Xj in G is blocked by a set G if it contains a vertex 
G TT n C. Hence, G d-separates A from B if and only if G separates 
A from B in the undirected graph underlying G. Thus, [HI Theorem 12] 
implies that /giobai(G) coincides with the distinguished prime ideal ker($), 
this ideal has a quadratic Grobner basis. The second assertion is proved by 
the networks 18 and 26 in Tabled See also [231 Example 8.8]. □ 

We close this section with a conjectured characterization of the global 
Markov property on a Bayesian network G in terms of commutative algebra. 

Conjecture 7. /giobal{G) ^-^ the ideal generated by all quadrics in ker(<I>). 



4 The Distinguished Component 

In what follows we shall assume that every edge {i,j) of the Bayesian network 
G satisfies i > j. In particular, the node 1 is always a sink and the node 
n is always a source. For any integer r € [n] and Ui £ [di] as before, we 
abbreviate the marginalization over the first r random variables as follows: 

dl d2 dr 

P+A \-Ur+r--Un •= ^ ^ ^ ^ ■ ■ ■ ^ ^ Pili2---irUr + VUn- 

11=112 = 1 ir = l 

This is a linear form in our polynomial ring R[Z)]. We denote by p the 
product of all of these linear forms. Thus the equation of p = defines 
a hyperplane arrangement in R^. We shall prove that the ideal /locai(G) is 
prime locally outside this hyperplane arrangement, and hence so is /giobai(G)- 
The following theorem provides the solution to [231 Problem 8.12]. 

Theorem 8. The prime ideal ker($) is a minimal primary component of 
both of the ideals /locai(G) ^"-^ -^giobai{G) • More precisely, 

(Wl(G):P~) = (4lobal(G) : P"") = ker($). (6) 

The prime ideal ker(<I>) is called the distinguished component. It can 
be characterized as the set of all homogeneous polynomial functions on MP 
which vanish on all probability distributions that factor according to G. 
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Proof. We relabel G so that pa(l) = {2, 3, . . . , r} and nd(l) = + . . . , n}. 
Let A denote a set of (di — l)d2 ■ ■ ■ dr new unknowns ai-^i^-.-i^, for ii > 1 
defining a polynomial ring ]R[yl]. Define ^2 • • • dr linear polynomials 

di 

i=2 

Let Q denote a set of ^2 • • • c^n new unknowns %2-vir+i-in = Qi2-in^ defining 
a polynomial ring M[Q]. We introduce the partial factorization map 

^ :M[D] ^M[AUQ], pi,i,...i„ ^ ai,...i^ ■ q^,...i„. (7) 

The kernel of \1' is precisely the ideal Ii := -^i_|j_nd{i)|pa{i)' Note that 

Therefore ^ becomes an epimorphism if we localize M.[D] at the product pi 
of the p + i2---in ™d we localize R at the product of the qi2-i„- This implies 
that any ideal L in the polynomial ring M[D] satisfies the identity 

^'H^iL)) = {{L + h):pr)- (8) 

Let G' denote the graph obtained from G by removing the sink 1 and all 
edges incident to 1. We regard /iocai{G') ^ ideal in We modify the 

set of independence statements local(G) by removing 1 from the sets nd{i) 
for any i > 2. Let J C M[D] be the ideal corresponding to these modified 
independence statements, so that ^'(J) = /iocai{G')- Note that 

J + h -flocal{G) ^ 4lobal(G) ^ ker($), 

SO it suffices to show that ( J + Ji) : p°° = ker($). The map ^> factors as 

M[i:»] -^Um[auq] ^M[^u£;'] =M[£;], (9) 

where ^' is the factorization map coming from the graph G' , extended to 
be the identity on the variables A. By induction on the number of vertices, 
we may assume that Theorem |51 holds for the smaller graph G' , i.e., 

ker($') = (/locai(G') : q^) = "^(J-pT), (10) 

where q2 = ^'(p2) and p2 is the product of the linear forms hui---un 

with at least two initial +'s. Therefore 

ker($) = ^-\^iJ :p^)). (11) 

Applying we get ker($) = ((J : p^) + h) : pi°° = ( J + h) : p°°. □ 
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By following the technique of the proof, we can replace pi by the product 
of a much smaller number of p^u2---u„- In fact, we need only take the linear 
forms p+u2 -urii- i- Hence, by induction, p can be replaced by a much 
smaller product of linear forms. This observation proved to be crucial for 
computing some of the tough primary decompositions in Section 6. 

As a corollary we derive an algebraic proof of the Factorization Theorem. 

Proof of Theorem\^ We use induction on the number of nodes to show that 
every point in l^>(/iocai(G) + (P~l)) also lies in image((/>>). Such a point is a 
homomorphism r : R[D] — > R with the property that r is zero on /local(G)) 
and its values on the indeterminates Pui---un are non- negative and sum to 
1. The map r can be extended to a homomorphism r' : M[Q U A] ^ R 
as follows. We first set T'{qi^...i^) = T{p^i^...i^). If that real number is 
positive then we set r'(ai^...i^) = r(pi^i2...i„)/T(p + j2...j„), and otherwise we 
set r'(aij...j,,) = 0. Our non-negativity hypothesis implies that r coincides 
with the composition of r' and ^, i.e., the point r is the image of r' under 
the induced map M'^^'^ M^. The conclusion now follows by induction. □ 

We close this section by presenting our solution to 23, Problem 8.11]. 

Proposition 9. There exists a Bayesian network G on five binary random 
variables such that the local Markov ideal /local (G) is not radical. 

Proof. Let G be the complete bipartite network i^^2,3 with nodes {1,5} and 
{2,3,4} and directed edges (5,2), (5,3), (5, 4), (2,1), (3,1), (4,1). Then 

local(G) = {l_LL5|{2,3,4}, 2_U_{3,4}|5, 3_U_{2,4}|5, 4_U_{2,3}|5}. 

The polynomial ring M[£'] has 32 indeterminates Piiiii,Piiii25 • • • ,^22222- 
The ideal /local (G) is minimally generated by eight binomial quadrics 

PIU2U3U4I ' P2U2U3U42 PlU2U-iU42 ' P2u2UzU4,li '^■2;^3;^4 £ {^i^}) 

and eighteen non-binomial quadrics 

?'+122u5 ■P+22IM5 — P+121M5 ■P+222U5 5 P+212u5 • P+221u5 — P+2IIU5 ' P+222m5 , 

P+112U5 ■P+221«5 — P+IIIM5 ■P+222W5 5 P+122u5 ' P+2l2uz, " P+II2M5 ' P+222«5 , 

?'+121u5 •P+212M5 — P+IIIM5 •P+222U5, P+I22ur, • P+2IIU5 — P+IIIM5 • P+222n5 , 

P+112U5 ■ P+2IIU5 — P+IIIM5 •P+212u5, P+I2IU5 •P+2IIU5 — P+lllng •P+22IM5, 

P+112M5 • P+121U5 - P+lll«5 • P+122M5 ) ^5 G {1, 2}. 

These nine equations (for fixed value of u^) define the Segre embedding of 
xP^ in P"^, as in [531 eqn. (8.6), page 103]. Consider the polynomial 

/ = P+1112P+2222(?'12221?'12212?'12122P12111 — P12112P12121P12211P12222) • 
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By computing a Grobner basis, it can be checked that hes in /locai(G) 
but / does not he in /locai(G)- Hence /locai(G) is not a radical ideal. The 
primary decomposition of this ideal will be described in Example 1181 □ 

5 Networks on Four Random Variables 

In this section we present the algebraic classification of all Bayesian networks 
on four random variables. In the binary case we have the following result. 

Theorem 10. The local and global Markov ideals of all Bayesian networks 
on four binary variables are radical. The hypothesis "binary" is essential. 

Thus the solution 23, Problem 8.11] is affirmative for networks on four 
binary nodes. Proposition IHl shows that the hypothesis "four" is essential. 
Theorem^] is proved by exhaustive computations in Macaulay2. We sum- 
marize the results in Tabled Each row represents one network G on four 
binary random variables along with some information about its two ideals 

-flocal{G) ^ -^global{G) ^ , Plll2 , • • • , P2221 , P2222] • 

Here G is represented by the list of sets of children (ch(l), ch(2), ch(3), ch(4)). 
The information given in the second column corresponds to the codimension, 
degree, and number of minimal generators of the ideal /locai(G) • -'^o'^ example, 
the network in the fourth row has four directed edges (2, 1), (3, 1), (4, 1) and 
(4,2). Here /locai(G) = -^giobai(G) = ker($). This prime has codimension 3, 
degree 4 and is generated by the six 2 x 2-minors of the 2 x 4-matrix 

(P+Ill P+112 P+211 P+212\ 
VP+121 V+122 V+221 P+222 J ' 

Of the 30 local Markov ideals in Tabledall but six are prime. The remaining 
six ideals are all radical, and the number of their minimal primes is listed. 
Hence all local Markov ideals are radical. The last column corresponds to 
the ideal /giobai{G)- This ideal is equal to the distinguished component for 
all but two networks, namely 15 and 17. For these two networks we have 
-^iocai(G) = -^giobai(G)- This proves the first assertion of Theorem 

The main point of this section is the second sentence in Theorem 1101 
Embedded components can appear when the number of levels increases. In 
the next theorem we let di,d2,d^ and d^ be arbitrary positive integers. 

Theorem 11. Of the 30 local Markov ideals on four random variables, 22 
are always prime, five are not prime but always radical (numbers 10,11,16, 
18,26 in TableU\) o^nd three are not radical (numbers 15,17,21 in TableU\)- 
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Index 


Information 


Network 


Local 


Global 


1 


1, 2, 1 


{},{1},{1,2},{1,2} 


prime 




2 


2, 4, 2 


{},{1},{1}, {1,2,3} 

LJ'L J'L J'L ' " J 


prime 




3 


2, 4, 2 


{},{1},{1,2},{1,3} 

LJ'L J'L ' J'L ' J 


prime 




4 


3, 4, 6 


{},{1},{1},{1,2} 

LJ'L J'L J'L ' J 


prime 




5 


4, 6, 9 


{},{1},{1},{1} 

LJ'L J'L J'L J 


prime 




6 


4, 16, 4 


{},{}, {1,2}, {1,2, 3} 

LJ'LJ'L ' J'L ' ' J 


prime 




7 


4, 16, 4 


{},{1},{1,2},{2,3} 

LJ'L J'L ' J'L ' J 


prime 




8 


4, 16, 4 


{},{1},{2},{1,2,3} 

LJ'L J'L J'L ' ' J 


prime 




9 


5, 32, 5 


{},{}, {1,2}, {1,2} 

LJ'LJ'L ■ J'L ' J 


prime 




10 


5, 32, 5 


{},{1},{1,2},{2} 

LJ'L J'L " J'L J 


prime 




11 


6, 8, 10 


{},{1},{1},{2} 

LJ'L J'L J'L J 


radical, 5 comp. 


prime 


12 


6, 16, 12 


{},{},{1}, {1,2,3} 

LJ'LJ'L J'L ' ' J 


prime 




13 


6, 16, 12 


{},{}, {1,2}, {2, 3} 

LJ'LJ'L ' J'L ' J 


prime 




14 


6, 16, 12 


{},{1},{2},{2,3} 

LJ'L J'L J'L ' J 


prime 




15 


6, 64, 6 


{},{1},{1},{2,3} 

LJ'L J'L J'L ' J 


radical, 5 comp. 


radical 


16 


6, 64, 6 


{},{1},{1,2},{3} 

LJ'L J'L " J'L J 


radical, 9 comp. 


prime 


17 


6, 64, 6 


{},{1},{2},{1,3} 

LJ'L J'L J'L ' J 


radical, 5 comp. 


radical 


18 


7, 8, 14 


{},{1},{2},{3} 

LJ'L J'L J'L J 


radical, 3 comp. 


prime 


19 


7, 8, 28 


{},{},{1},{1,3} 

LJ'LJ'L J'L ' J 


prime 




20 


7, 24, 16 


{},{},{1},{1,2} 

LJ'LJ'L J'L ' J 


prime 




21 


7, 32, 13 


{},{1},{2},{2} 

LJ'L J'L J'L J 


prime 




22 


8, 14, 31 




prime 




23 


8, 34, 20 


{},{},{1},{2,3} 


prime 




24 


8, 36, 18 


{},{},{},{1,2,3} 


prime 




25 


8, 36, 18 


{},{}, {1,2}, {3} 


prime 




26 


9, 20, 27 


{},{},{1},{2} 


radical, 5 comp. 


prime 


27 


9, 24, 34 


{},{},{},{1,2} 


prime 




28 


9, 24, 34 


{},{},{!}, {3} 


prime 




29 


10, 20, 46 


{},{},{},{!} 


prime 




30 


11, 24, 55 


{},{},{},{} 


prime 





Table 1: All Bayesian Networks on Four Binary Random Variables 



Proof. We prove this theorem by an exhaustive case analysis of all thirty 
networks. In most cases, the ideal /locai(G) can be made binomial by a 
suitable coordinate change, just like in the proof of Theorem ^ In fact, 
let us start with a non-trivial case which is immediately taken care of by 
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Theorem ^ 

The network 16; Here we have local(G) = {l_LL4 | {2, 3}, 2_LL4|3}. For 
fixed value of the third node we get the model {1_LL4 | 2, 4_LL2} whose ideal 
was shown to be radical in Theorem ^ Hence /locai(G) is the ideal generated 
by ds copies of this radical ideal in disjoint sets of variables. We conclude 
that /local(G) is radical and has (2'^^ — 1)*^^ minimal primes. 
The networks 1, 2, 3, 4, 6, 7, 8, 12, 13, 14; In each of these ten cases, 
the ideal /locai(G) is generated by quadratic polynomials corresponding to 
a single conditional independence statement. This observation implies that 
-^iocai(G) is a prime ideal, by [23 Lemma 8.2]. 

Tiie network 5; Here local(G) specifies the model of complete independence 
for the random variables ^2,^3 and X4. This means that /locai(G) is the 
ideal of a Segre variety, which is prime and has a quadratic Grobner basis. 

The networks 24 and 25; Each of these two networks describes the join of 
and ds Segre varieties. The same reasoning as in case 5 applies. 

The network 23; Observe that /locai(G) = 4iobai(G) = A_LL{2,4}|3+^2_LL{i,3}|4- 
Since G is a directed tree. Theorem IHl implies that /giobai(G) coincides with 
the distinguished prime ideal ker($). Therefore, /locai(G) is always prime. 
The networks 19, 22, 27, 28, 29, 30; Each of these six networks has an 
isolated vertex. This means that /locai(G) is the ideal of the Segre embedding 
of the product of two smaller varieties namely, the projective space P'^'"^ 
corresponding to the isolated vertex i and the scheme specified by the local 
ideal of the remaining network on three nodes. The latter ideal is prime and 
has a quadratic Grobner basis, by Proposition [3 and hence so is /locai(G)- 

Tiie network 20; The ideal /iocai{G) is binomial in the coordinates Pijki with 
iG {+,2,... , di}. Generators are Pi^j^klPiijikl - PhjiklPiijikh Piij2kilPi2jik2l 
-PiijikiiPi2j2k2h and p+j^k2hP+j2kii2 - P+jikihP+j2k2i2- ^hc S-pairs within 
each group reduce to zero by the Grobner basis property of the 2 x 2-minors 
of a generic matrix. It can be checked easily that the crosswise reverse lex- 
icographic S-pairs also reduce to zero. We conclude that the given set of 
irreducible quadrics is a reverse lexicographic Grobner basis. In view of j22| 
Lemma 12.1], the lowest variable is not a zero-divisor, and hence by symme- 
try none of the variables pijki is zero-divisor. It now follows from equation 
(jnj in Theorem ISl that /locai(G) coincides with the prime ideal ker($). 

Tiie network 9; The ideal /locai(G) is generated by the quadratic polyno- 
mials pi^j^kiPi2jiki - PhnkiPi2j2ki, P++kii2P++k2h - P++kihP++k2i2- These 
generators form a Grobner basis in the reverse lexicographic order. Indeed, 
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assuming that ii < i2, ji < j2, ki < k2, h < h, the leading terms are 
Piij2kiPi2jiki and PnkihPuk2h- Hence no leading term from the first group 
of quadrics shares a variable with a leading term from the second group. 
Hence the crosswise S-pairs reduce to zero by ||41 Prop. 4, §2.9]. The S-pairs 
within each group also reduce to zero by the Grobner basis property of the 
2 X 2-minors of a generic matrix. Hence the generators are a Grobner basis. 
Since the leading terms are square-free, we see that the ideal is radical. An 
argument similar to the previous case shows that /locai(G) is prime. 
The network 18; Here G is a directed chain of length four. We claim that 
-^iocai(G) is the irredundant intersection of 2"^^ — 1 primes, and it has a Grobner 
basis consisting of square-free binomials of degree two, three and four. We 
give an outline of the proof. We first turn /iocai{G) i^^to a binomial ideal by 
taking the coordinates to be pijki with i G {+,2,3, . . . ,di}. The minimal 
primes are indexed by proper subsets of [^2]- For each such subset a we 
introduce the monomial prime = {p+jkl '■ j cr,k £ [ds],/ € [^4]) and 
the complementary monomial m„ = llje[d2]\aUke[d3] I\ii^[di]P+jkl, and we 
define the ideal = ((-fiocai(G) + -^^o-) : m'^y These ideals are prime, and 
the union of their varieties is irredundant and equals the variety of /locai(G) ■ 
Using Buchberger's S-pair criterion, we check that the following four types 
of square-free binomials are a Grobner basis: 

• the generators PhjkihPiijk^h - Piijk2i2Piijk2i2 encoding 1-LL{3, 4} | 2, 

• the generators p+j^khP+jikh - P+hki2P+j2kh encoding 2_LL4|3, 

• the CubicS {p+jikhPij2kl2 - P+jikl2Pij2kh) ■P+j2k3h-> 

• the quartics {PhnkhPi2j2ki2 - Piijiki2Pi2j2kh) ■ P+hhhi ■ p+j^hk^- 

The network 10; The ideal /locai(G) is generated hy pi^jki2Pi23kh-PhjkhPi2jkh 
and p++kii2P++k2ii -P++kihP++k2i2- In general, this ideal is not prime, but 
it is always radical. If ^4 = 2 then the ideal is always prime, If ^4 > 2, 
/locai(G) is the intersection of the distingushed component and 2'^3-i prime 
ideals indexed by all proper subsets a C [ds] as in the previous network. 

The network 11; Here, local(G) = {l_U_4 | {2,3}, 2_U_3 | 4, 3_U_{2,4}}. The 
ideal /locai(G) is binomial in the coordinates pijki with i £ {+, 2, . . . , di}. It 
is generated by the binomials Pi^jkhPi^jkh - PiijkhPi2jkh, P+jikihP+j2k2h - 
pj^j^k2hP+j2kii2 encoding the first and third independent statements. The 
minimal primes are indexed by pairs of proper subsets of [^2] and [^3]. For 
each such pair of subsets {a, r) we introduce the monomial prime Mj-^-^t-) = 
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{p+jki '■ j & o;k E T,l E [^4]) and the complementary monomial m^cr,T) = 
Ilje[d2]\a Ilke[d3]\T Ili&[d4] P+jkh and we define the ideal P(^,^) = ((/locai(G) + 
M(o- ,-)) : m^^-j). These ideals are prime, and the union of their varieties 
equals the variety of /locai(G)- Moreover, the ideal /locai(G) is equal to the 
intersection of the minimal primes which are indexed by the following pairs: 
For each proper r C [^3] the pair (0,r), and for each nonempty proper 
<J C [^2] the pairs (cr, r) where r C [d^] is any subset of cardinality at 
most (is — 2. In particular, for ^2 = ^3 = 3, and arbitrary di,d4^, the ideal 
-^iocai(G) has 31 prime components. For ^2 = 2, ^3 = 4, /locai(G) ^^as 37 prime 
components, and for ^2 = 4, c/3 = 2, -fiocai(G) '^as 17 prime components. 

The network 26: The ideal /locai(G) is a radical ideal. The minimal primes 
are indexed by all pairs of proper subsets of [^3] and [^4] . For each such pair 
{a, t) we introduce the monomial primes Ma- = {p+jki '■ k £ a,j £ [^2] , I G 
[c?4]), Mr = {pi+ki : I eT,i e [di],k G [^3]), and M^a,T) = + Mr. Just 
as before, we introduce the complementary monomial m(^a,T) > and the ideal 
Pia,r) = ((-^iocai(G) + ^(ct.t) ) : "^(?,r))- ^hc ideal /locaKG) is equal to the 
intersection of all these prime ideals. 

The network 21; Here, local(G) = {l-LL{3, 4} | 2, 3_LL4}. The ideal /local(G) 
is generated by the binomials Pnjk2i2Pi2jkih -PidkihPi2jk2l2^ and the polyno- 
mials p++kii2P++k2ii - P++kihP++k2i2- This ideal is not radical, in general. 
The first counterexample occurs for the case di = d2 = d^ = 2 and 0^4 = 3. 
Here /iocal{G) is generated by 33 quadratic polynomials in 24 unknowns. The 
degree reverse lexicographic Grobner basis of this ideal consists of 123 poly- 
nomials of degree up to 8. In this case, /locai(G) is the intersection of the 
distinguished component and the P-primary ideal Q = -fi_|j_{3 4} 1 2 + 
where P is the prime ideal generated by the 12 linear forms p+jki- 

The networks 15 and 17: Here, after relabeling network 17, local(G) = 
|l_LL4 I {2, 3}, 2_LL3|4|. The ideal -^iocai(G) is binomial in the coordinates 
Pijki with i e {+,2,. . . , di}. It is generated by the binomials Pi^jkhPi2jkl2 - 
Piijkl2Pi2jkh, P+jikilP+j2k2l-P+jik2lP+j2kil- This ideal is not radical, in gen- 
eral. The first counterexample occurs for the case di = 2 and d2 = d^ = d4^ = 
3. Here Iiocai(G) is generated by 54 quadratic binomials in 54 unknowns. The 
reverse lexicographic Grobner basis consists of 13, 038 binomials of degree 
up to 14. One of the elements in the Grobner basis is 

P+lllP+223(P+33l)^ • (P2122P2133P2323P2332 — P2333P2322P2132P2123) • 

Removing the square from the third factor, we obtain a polynomial / of 
degree 7 such that that / / but G /. This proves that I is not radical. 
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The number of minimal primes of /locai(G) is equal to 2'^^ +2'^^ -3. □ 

In the 22 cases where Iiocai is prime, it follows from Theorem |H1 that the 
global Markov ideal /global is prime as well. Among the remaining cases, we 
have /local(G) = /global{G) for networks 10,15,17,21, and we have /local 7^ 
/global = ker(<I>) for networks 11,16,18,26. This discussion implies: 

Corollary 12. Of the 30 global Markov ideals on four random variables, 26 
are always prime, one is not prime but always radical (number 10 in Table 
CP and three are not radical (numbers 15,17,21 in TableU\). 

It is instructive to examine the distinguished prime ideal P = ker(<I>) in 
the last case 15, 17. Assume for simplicity that di = 2 but ^2,(^3 and ^4 
are arbitrary positive integers. We rename the unknowns Xjki = P2jki and 
Ujkl = P+jkl- Then we can take ^ to be the following monomial map: 

^[xjki,yjkil ^[ujk,Vji,Wki], Xjki ^ UjkVjiWki, Vjki ^ VjiWki, (12) 

For example, for d2 = d^ = 3 and ^4 = 2, the ideal P = ker(^>) has 361 
minimal generators, of degrees ranging from two to seven. One generator is 

2^11ia;i32a;222a^3122;321?/22iy331 " a;il2a;i3lX22lX3ll3;322y232y321 • 

Among the 361 minimal generators, there are precisely 15 which do not 
contain any variable Vijk, namely, there are nine quartics and six sextics like 

Xii22;i2lX21l2;2322;3222;331 " a;illXi22a:212a:23l2;32l2;332- 

These 15 generators form the Markov basis for the 3 x 3 x 2-tables in the 
no-three-way interaction model. See j22| Corollary 14.12] for a discussion. 

The ideal for the no-three-way interaction model of (i2 x ^3 x (i4-tables 
always coincides with the elimination ideal P nM[xijk] and, moreover, every 
generating set of P contains a generating set for PC] M[xijk]- In view of 
|22( Proposition 14.14], this shows that the maximal degree among minimal 
generators of P exceeds any bound as d2, c^s, c^4 increases. In practical terms, 
it is hard to compute these generators even for ^2 = ^3 = ^4 = 4. We refer to 
the web page http : //math . berkeley . edu/^seths/ ccachallenge . html| 

6 Networks on Five Binary Random Variables 

In this section we discuss the global Markov ideals of all Bayesian networks 
on five binary random variables. In each case we computed the primary 
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decomposition. In general, the built-in primary decomposition algorithms 
in current computer algebra systems cannot compute the primary decompo- 
sitions of most of these ideals. In the Appendix, we outline some techniques 
that allowed us to compute these decompositions. The primary decomposi- 
tions of the local Markov ideals of these networks could also be computed, 
but they have less regular structure and are in general more complicated. 

There are 301 distinct non-complete networks on five random variables, 
up to isomorphism of directed graphs. We have placed descriptions of these 
networks and of the primary decompositions of their global Markov ideals on 
the website http : //math . Cornell . edu/ ~inike/bayes/global5 . html In 
this section, we refer to the graphs as Go, Gi, . . . , G300, the indices matching 
the information on the website. We summarize our results in a theorem. 

Theorem 13. Of the 301 global Markov ideals on five binary random vari- 
ables, 220 are prime, 68 are radical but not prime, and 13 are not radical. 

Proof. The proof is via direct computation with each of these ideals in 
Macaulay2. Some of these require little or no computation: if G is a di- 
rected forest, or if there is only one independence statement, then the ideal 
is prime. Others require substantial computation and some ingenuity to find 
the primary decomposition. Results are posted at the website cited above. 

To prove primality, it suffices to compute the ideal quotient of I = 
/giobai(G) with respect to a small subset of the p+++ur-un- Alternatively, 
one may birationally project / by eliminating variables, as in Proposition l231 
In either case, if a zero divisor x is found, the ideal is not prime. If some 
ideal quotient satisfies (/ : x^) 7^ (/ : x), then / is not radical. □ 

The numbers of prime components of the 288 radical global Markov 
ideals range from 1 to 39. The distribution is given in the following table: 



# of components 


1 


3 


5 


7 


17 


25 


29 


33 


39 


# of ideals 


220 


8 


41 


3 


9 


1 


2 


3 


1 



Theorem 14. Conjecture\^is true for Bayesian networks G on five binary 
random variables. In each of the 301 cases, the distinguished prime ideal 
ker($) is generated by homogeneous polynomials of degree at most eight. 

Proof. We compute the distinguished component from /giobai(G) by satu- 
ration, and we check the result by using the techniques in the Appendix. 
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The computation of the distinguished component of the 81 non-prime ex- 
amples yields that 64 of these ideals are generated in degrees < 4, twelve 
are generated in degrees < 6, and five are generated in degrees < 8. □ 

Theorem |H1 says that we can decide primality or find the distinguished 
component of /giobai(G) by inverting each of the p+^+ui-u„- With some care, 
it is possible to reduce this to a smaller set. Still, the following is unexpected. 

Proposition 15. For all but two networks on five binary random variables, 
p+iiii is a non-zero divisor on I = /giobai(G) ^/ ^'^^ ''^^Z/ ^ prime. In all 
but these two examples, I is radical if and only if {I : p'^im) = (/ : p+im). 

Proof. The networks which do not satisfy the given property are G201 = 
({}, {1}, {1, 2}, {1, 2}, {3, 4}) and G214 = ({}, {1}, {1, 2}, {3}, {1, 2, 4}) . Af- 
ter permuting the nodes 4, 5, both the local and global independence state- 
ments of G214 are the same as those for G2oi- The global independence 
statements for G201 are {{1, 2}_U_5 | {3, 4}, 3_LL4|5}. The primary decom- 
position for the radical ideal / = -fgiobai(G2oi ) 

/ = ker($) n (/ + P++1..) n (/ + P++2..) n (/ + P++.i.) n (/ + P++.2.), 

where ker($) is the distinguished prime component, 

P++l,m = (p++lll,p++ll2,P++121,P++122), 

and the other three components are defined in an analogous manner. There- 
fore, is a non-zero divisor modulo /. By examining all 81 non-prime 
ideals, we see that all except these two have a minimal prime containing 
The final statement also follows from direct computation. □ 

We have searched for conditions on the network which would characterize 
under what conditions the global Markov ideal is prime, or fails to be prime. 
Theorem [S] states that if the network is a directed forest, then the global 
Markov ideal is prime. Two possible conditions, the first for primality, and 
the second for non-primality, are close, but not quite right. We present 
them, with their counterexamples, in the following two propositions. 

Proposition 16. There is a unique network G on 5 binary nodes whose 
underlying undirected graph is a tree, but /giobal(G) ^-^ radical. Every other 
network whose underlying graph is a tree has prime global Markov ideal. 
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Proof. The unique network is G23 = ({}, {1}, {2}, {2}, {2}) . Its local and 
global Markov independent statements coincide and are equal to 

{l_U_{3,4,5} I 2, 3_U_{4,5}, 4_U_{3,5}, 5_U_{3,4}}. 
Computation using Macaulay2 reveals 

-^global(G23) = ker($) n (/global(G23) + iP+»*»»f), 

where -P+,,,, is the ideal generated by the 16 linear forms P+U2U3U4U5 ■ In- 
specting the 81 non-prime ideals shows that G23 is the only example. □ 

We say that the network G has an induced r- cycle if there is an induced 
subgraph H of G with r vertices which consists of two disjoint directed paths 
which share the same start point and end point. 

Proposition 17. Of the 301 networks on five nodes, 70 have an induced 
A-cycle or 5-cycle. For exactly two of these, the ideal /giobai(G) ^-^ prime. 

Proof. Once again, this follows by examination of the 301 cases. The graphs 
which have an induced 4-cycle but whose global Markov ideal is prime are 

G265 = {{},{!}, {1,2}, {1,2}, {2,3, 4}} 
and G269 = {{}, {1}, {1, 2}, {2, 3}, {1, 2, 4}}. 

Removing node 2 results in a 4-cycle. The local and global Markov state- 
ments are all the same up to relabeling: {l_LL5 | {2, 3, 4}, 3_LL4 | 5} . □ 

There are four graphs with three induced 4-cycles, namely, Gisg, G139, 
^150, ^157. The first two graphs give rise to the same (global or local) inde- 
pendence statements, and similarly for the last two. The ideal -^giobai{Gi38) 
has the most components of any of the 301 ideals considered in this section. 

Example 18. The network Giss = ({}, {1}, {1}, {1}, {2, 3, 4}) is isomor- 
phic to the one in Proposition |^ Its ideal -^giobai(Gi38) 207 minimal 
primes, and 37 embedded primes. Each of the 207 minimal primary compo- 
nents are prime. We will describe the structure of these components. 

Let Fi^i^i^ = det f^+^i^^^i P+^l^2i32\ _ Let be the ideal generated by 

\P2i1i2i3l P2hi2h2 J 

the 2x2 minors located in the first two rows or columns of the matrix 

/P+Illi P+ll2i P+211i P+212A 

P+121i P+122i P+221i P+222i 

P+211i P+2l2i * * 

\p+22li P+222i * * / 
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We have 



I '■— -^global(Gi38) ~ -/l + + (-F^lll, -^"112, • • • , -f222)- 

Each Ji is minimaUy generated by 9 quadrics, so that / is minimally gen- 
erated by 26 quadrics. Each Jj is prime of codimension 4, and so Ji + J2 
is prime of codimension 8. Since there are only 8 more quadrics, Krull's 
principal ideal theorem tells us that all minimal primes have codimension 
at most 16, which is also the codimension of the distinguished component. 
Note that / is a binomial ideal in the unknowns p+u2U3UiUi and P2U2M3M4W5 ■ 



7^ primes 


codim 


degree 


faces 


6 


14 


48 


(/,/), / a facet 


12 


14 


4 


(e, e), e an edge 


24 


16 


15 


(/i,/2), /i n /2 is an edge 


48 


16 


4 


(/, e), / n e is a point 


12 


16 


1 


(61,62), 2 antipodal edges 


48 


16 


1 


(61,62), 2 non-parallel disjoint edges 


48 


16 


1 


(6,p), point p on the edge antipodal to e 


8 


16 


1 


{pi,P2), antipodal points 


1 


16 


2316 


distinguished component 



Table 2: All 207 minimal primes of the ideal -^giobai(Gi38) 

Let A be the unit cube, with vertices (1, 1, 1), (1, 1,2),..., (2, 2, 2). If 
(T C A is a face, define Pa^i to be the monomial prime generated by {p+vi \ 
V ^ a}, for z S {1,2}. If P is a minimal prime of /, which is not the 
distinguished component, then P must contain some p+viV2V3i, and also 
contain some p^u-i^u2U32- Therefore, there are faces ai and (T2 of A such that 
P contains Pai,i +-P(72,2; and does not contain any other elements p+vi- Let 
maicT2 be the product of all of the p+m such that v G ai for i = 1, 2. It turns 
out that every minimal prime ideal of / has the form 

:= ((/ + P,^,i + P,^,2):m-,J 

for some pair ai, a2 of proper faces of the cube A. However, not all pairs of 
faces correspond to minimal primes. There are 27 proper faces of the cube, 
and so there are 27^ = 729 possible minimal primes. Only 206 of these 
occur. The list of minimal primes is given in Table El □ 

Bayesian networks give rise to very interesting (new and old) construc- 
tions in algebraic geometry. In the next section, we shall encounter secant 
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varieties. Here, we offer a generalization of Example to arbitrary toric 
varieties. Let I a C M[2:i, . . . , z„] be any toric ideal, specified as in [22; by a 
point configuration A = {ai, . . . , a„} C Z'^. Let A be the convex hull of A 
in W^. We define the double join of the toric ideal Ia to be the new ideal 

lA{x)+lA{y) + {Fi, ... ,Fn) C M[xi, ...,Xn,yi,.. • ,yn,ai, • • • an,bi, ...,bn] 

(X ' (X'\ 
* , M , and Ia{x) and /^(y) are generated by copies of I a 

in R[xi, . . . , Xn] and M[yi, . . . , y„] respectively. The ideal / in Example El 
is the double join of the Segre variety P-*^ x x c P'', which is the toric 
variety whose polytope A is the 3-cube. In general, the minimal primes of 
the double join of I a are indexed by pairs of faces of the polytope A. We 
believe that this construction deserves the attention of algebraic geometers. 



7 Hidden Variables and Higher Secant Varieties 

Let G be a Bayesian network on n discrete random variables and let Pq = 
ker($) be its homogeneous prime ideal in the polynomial ring M[-D], whose 
indeterminates Pij^i^-in represent probabilities of events {ii,i2, . . . ,in) £ 
D. We now consider the situation when some of the random variables are 
hidden. After relabeling we may assume that the variables corresponding to 
the nodes r + 1, . . . , n are hidden, while the random variables corresponding 
to the nodes 1, . . . ,r are observed. Thus the observable probabilities are 

Pili2---ir +-\ h = Phh—irjr + ljr + i—jn- 

jr + lG[rfr+l] jr + 2e[dr + 2] jn&ldn] 

We write D' = [di] x • • • x [dr] and M.[D'] for the polynomial subring of W[D] 

generated by the observable probabilities pi-^^i^...i^ |_. Let vr : —>■ E-^ 

denote the canonical linear epimorphism induced by the inclusion of 

in M[L']. We are interested in the following inclusions of semi-algebraic sets: 



n{V>oiPG)) C 7TiV{PG))>o C niViPc)) C tt{V{Pg)) C M^'. (13) 

These inclusions are generally all strict. In particular, the space tt{V>o{Pg)) 
which consists of all observable probability distributions is often much smaller 
than the space '7r{V{PG))>o which consists of probability distributions on D' 
which would be observable if non-negative or complex numbers were allowed 
for the hidden parameters. However, they have the same Zariski closure: 
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Proposition 19. The set of all polynomial functions which vanish on the 
space vr(V>o(PG)) of observable probability distributions is the prime ideal 



Qg 



Pg n M 



(14) 



Proof. The elimination ideal Qg C M[D'] is prime because Pg C M[D] was 
a prime ideal. By the Closure Theorem of Elimination Theory ^ Theorem 
3, §3.2], the ideal Qg is the vanishing ideal of the image ■k{V{Pg))- Since 
^>o(-Pg) is Zariski dense in ^(Pg), by the Factorization Theorem|21 and vr 
is a linear map, it follows that T^iy>Q{PG)) is Zariski dense in 'k[V[Pg))- □ 

We wish to demonstrate how computational algebraic geometry can be 
used to study hidden random variables in Bayesian networks. To this end 
we apply the concepts introduced above to a standard example from the 
statistics literature [7], |17j . jl8j . We fix the network G which has n + 1 
random variables Fi, . . . , F„, H and n directed edges {H, Fi), i = 1, 2, . . . , n. 
This is the naive Bayes model. The variable H is the hidden variable, and 
its levels 1, 2, . . . , =: r are called the classes. The observed random 
variables Fi,...,Fn are the features of the model. In this example, the 
prime ideal Pg coincides with the local ideal /locai(G) which is specified by 
requiring that, for each fixed class, the features are completely independent: 



This ideal is obtained as the kernel of the map Pi^i2---irik ^ Xi^yi^ ■ ■ ■ Zi^, 
one copy for each fixed class k, and then adding up these r prime ideals. 
Equivalently, Pg is the ideal of the join of r copies of the Segre variety 



The points on Xdi^d2,...,d„ represent tensors of rank < 1. Our linear map vr 
takes an r-tuple of tensors of rank < 1 and it computes their sum, which is 
a tensor of rank < r. The closure of the image of vr is what is called a higher 
secant variety in the language of algebraic geometry |121 Example 11.30]. 

Corollary 20. The naive Bayes model with r classes and n features corre- 
sponds to the r-th secant variety of a Segre product of n projective spaces: 



The case n = 2 of two features is a staple of classical projective geometry. 
In that special case, the image of vr is closed, and 'ir{V{PG)) = Sec^ {Xd^^(i2) 



FiALF2AL---ALFn\H. 



<V{Pg)) 



Sec''{Xd,,d2,-,dJ 



(16) 
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consists of all real di x (i2-niatrices of rank at most r. This variety has codi- 
mension (di — r)(d2 — r), provided r < min(di, 1^2). Its ideal Qg is generated 
by the (r + 1) x (r + l)-minors of the di x d2 matrix (pij^). The dimension 
formula of Settimi and Smith |18t Theorem 1] follows immediately. For in- 
stance, in the case of two ternary features {di = ^2 = 3, r = 2), discussed in 
different guises in jl81 §4.2] and jl2l Example 11.26], the observable space is 
the cubic hypersurface defined by the 3 x 3-determinant det(pjj_|_). 

The leftmost inclusion in leads to difficult open problems even for 
n = 2 features. Here, n(y{PG))>o is the set of all non-negative di x d2- 
matrices of rank at most r, while tt{V>o{Pg)) is the subset consisting of 
all matrices of non-negative rank at most r. Their difference consists of 
non-negative matrices of rank < r which cannot be written as the sum of r 
non- negative matrices of rank 1. In spite of recent progress by Barradas and 
Solis , there is still no practical algorithm for computing the non- negative 
rank of a di x d2-matrix. Things get even harder for n > 3, when testing 
membership in tt{V>o{Pg)) means computing non-negative tensor rank. 

We next discuss what is known about the case of n > 3 features. The 
expected dimension of the secant variety (|l()jl is 



This number is always an upper bound, and it is an interesting prob- 
lem, studied in the statistics literature in to characterize those cases 
{di, . . . ,dn',r) when the dimension is less than the expected dimension. We 
note that the results on dimension in [Jj are all special cases of results by 
Catalisano, Geramita and Gimigliano ^ , and the results on singularities in 
[7] follow from the geometric fact that the r-th secant variety of any projec- 
tive variety is always singular along the (r — l)-st secant variety. The statis- 
tical problem of identifiability, addressed in ^7j) is related to the beautiful 
work of Strassen ^O] on tensor rank, notably his Theorem 2.7 on optimal 
computations. 

In Table 01 we display the range of straightforward Macaulay2 computa- 
tions when (iim(X) = + 1 is small. First consider the case of two 
classes (r = 2), which corresponds to secant lines on X = P*^!"^ x • • • xP"^""-^. 
In each of these cases, the ideal Qg is generated by cubic polynomials, 
and each of these cubic generators is the determinant of a two-dimensional 
matrix obtained by flattening the tensor {piii2---i„)- The column labeled 
"cubics" lists the number of minimal generators. For example, in the case 
{di = d2 = ds = 3), we can flatten (pijk) in three possible ways to a 3 x 9- 
matrix, and these have 3- (3) = 252 maximal subdeterminants. The vector 




r-{di + d2 + --- + dn-n + l) - 1. 



(17) 
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fl,i,Tn.( 'K) 






i cl^ d,^ ) 
\u^l ; • • • ; *~*'n J 


d.POTPP 


ciihics 


4 


9 


12 


(2,2,3) 


6 


4 


4 


9 


16 


(2,2,2,2) 


64 


32 


5 


11 


16 


(2,2,4) 


20 


16 


5 


11 


18 


(2,3,3) 


57 


36 


5 


11 


24 


(2,2,2,3) 


526 


184 


5 


11 


32 


(2,2,2,2,2) 


3256 


768 


6 


13 


20 


(2,2,5) 


50 


40 


6 


13 


24 


(2,3,4) 


276 


120 


6 


13 


27 


(3,3,3) 


783 


222 


6 


13 


32 


(2,2,2,4) 


2388 


544 


6 


13 


36 


(2,2,3,3) 


6144 


932 



Table 3: The prime ideal defining the secant lines to the Segre variety (|15() 

space spanned by these sub determinants has dimension 222, the listed num- 
ber of minimal generators. The column "degree" lists the degree of the 
projective variety Sec'^{X), which is 783 in the previous example. These 
computational results in Table 01 lead us to make the following conjecture: 

Conjecture 21. The prime ideal Qg of any naive Bayes model G with r = 2 
classes is generated by the 3 x 3-subdeterminants of any two-dimensional 
table obtained by flattening the n-dimensional table {pi^i2--i„)- 

It was proved by Catalisano, Geramita and Gimigliano that the variety 
Sec^(X) always has the expected dimension H17|) when r = 2. A well- 
known example (see page 221]) when the dimension is less than expected 
occurs for four classes and three binary features (r = 3, n = 4, di = d2 = 

= di = 2). Here ^ evaluates to 14, but dim(Sec^(X)) = 13 for 
The corresponding ideal Qg is a complete intersection 
generated by any two of the three 4 x 4-determinants obtained by flattening 
the 2 X 2 X 2 X 2-table (pijki)- The third is a signed sum of the other two. 

The problem of identifying explicit generators of Qg is much more diffi- 
cult when r > 3, i.e., when the hidden variable has three or more levels. We 
present the complete solution for the case of three ternary features. Here 
(Pijk) is an indeterminate 3 x 3 x 3-tensor which we wish to write as a sum 
of r rank one tensors. The following solution is derived from a result of 
Strassen [201 Theorem 4.6]. Let A = (piji) , B = {pij2) and C = (pijs) be 
three 3 x 3-matrices obtained by taking slices of the 3 x 3 x 3-table (pijk)- 
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Proposition 22. Let Qg be the ideal of Sec'"{F'^ x x P^), the naive 
Bayes model with n = 3 ternary features with r classes. If r = 2 then Qg 
is generated by the cubics described in Conjecture \21i // r = 3 then Qg is 
generated by the quartic entries of the various 3 x 3-matrices of the form 
A - adi{B) ■ C — C ■ adi{B) ■ A. Ifr = 4 then Qg is the principal ideal generated 
by the following homogeneous polynomial of degree 9 with 9, 216 terms: 

det{Bf ■ dei{A ■ B-^ ■ C - C ■ B'^ ■ B). 

If r > 5 then Qg is the zero ideal. 
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Appendix: Techniques for Primary Decomposition 

The ideals in this paper present a challenge for present day computer algebra 
systems. Their large number of variables (e.g. 32 in Section 6), combined 
with the sometimes long polynomials which arise are difficult to handle with 
built-in primary decomposition algorithms. Even the standard implementa- 
tions of factorization of multivariate polynomials have difficulty with some of 
the long polynomials. This is only a problem with current implementations, 
which are generally not optimized for large numbers of variables. 

For the computations performed in Sections 5 and 6, it was necessary 
to write special code (in Macaulay2) in order to compute the components 
and primary decompositions of these ideals. We also have some code in 
Macaulay2 or Singular for generating the ideals /local(G) or /giobal(G) fi'om 
the graph G and the integers di,d2, ■ ■ ■ ,dn- In this appendix we indicate 
some techniques and tricks that were used to compute with these ideals. 

The first modification which simplifies the problems dramatically is to 
change coordinates so that the indeterminates are P2u2---un and p+uj...^^, in- 
stead of Pui - u„- This change of variables sometimes takes a Markov ideal 
into a binomial ideal, which is generally much simpler to compute with. 
Computing any one Grobner basis, ideal quotient, or intersection of our 
ideals is not too difficult. Therefore, our algorithms make use of these op- 
erations. All ideals examined in this project have the property that ev- 
ery component is rational. The distinguished component ker(<I>) is more 
complicated than any of the other components, in terms of the number of 
generators and their degrees, and it cannot be computed by implicitization. 

The first problem is to decide whether an ideal is prime (i.e. whether 
it equals the unknown ideal ker($)). There are several known methods for 
deciding primality (see 0| for a nice exposition). The standard method is to 
reduce to a zero-dimensional problem. This entails either a generic change 
of coordinates, or factorization over extension fields. We found that the 
current implementations of these methods fail for the majority of the 301 
examples in Section 6. The technique that did work for us is to search for 
birational projections. This either produces a zero divisor, or a proof that 
the ideal is prime. It can sometimes be used to count the components (both 
minimal and embedded), without actually producing the components. 
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The following result is proved by localizing with respect to powers of g. 
This defines a birational projection {xi,X2, ■ ■ ■ , Xn) ^ {x2, ■ ■ ■ , Xn) for J. 

Proposition 23. Let J C . . . ,Xn] be an ideal, containing a polyno- 

mial f = gxi + h, with g, h not involving xi, and g a non-zero divisor modulo 
J. Let Ji = JnM[x2, • • • ,Xn] be the elimination ideal. Then 

(a) J = ((Ji,5Xi + /i) 

(b) J is prime if and only if Ji is prime. 

(c) J is primary if and only if Ji is primary. 

(d) Any irredundant primary decomposition of Ji lifts to an irredundant 
primary decomposition of J. 

Our algorithm to check primality starts by searching for variables which 
occur linearly, checking that its lead coefficient is not a zero divisor and 
then eliminating that variable as in Proposition 1231 In almost all of the 
Markov ideals that we have studied, iterative use of this technique proves 
or disproves primality. A priori, one might not be able to find a birational 
projection at all, but this never happened for any of our examples. 

The second problem is to compute the minimal primes or the primary 
decomposition. Finding the minimal primes is the first step in computing a 
primary decomposition, using the technique of fW|, which is implemented in 
several computer algebra systems, including Macaulay2. Here, we have not 
found a single method that always works best. One method that worked in 
most cases is based on splitting the ideal into two parts. Given an ideal I, 
if there is an element / of its Grobner basis which factors as / = /1/2, then 

v7 = 7(7:^) n ^(/,/2) :/r. 

We keep a list of ideals whose intersection has the same radical as I. We 
process this list of ideals by ascending order on its codimension. For each 
ideal, we keep a list of the elements that we have inverted by so far (e.g. /i 
in the ideal ((/, 72) : and saturate at each step with these elements. 

If there is no element which factors, then we search for a variable to 
birationally project away from, as in Proposition 1231 If its lead coefficient g 
is a zero divisor, use this element to split the ideal via 

V7 = /TTi n 7(7;^. 

As we go, we only process ideals which do not contain the intersection of all 
known components computed so far. 

If we cannot find any birational projection or reducible polynomial, then 
we have no choice but to decompose the ideal using the built-in routines. 
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which are based on characteristic sets. However, in none of the examples of 
this paper was this final step reached. This method works in a reasonable 
amount of time for all but about 10 to 15 of the 301 ideals in Section 6. 
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